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ABSTRACT _ 

There is a growing demand for surveillance systems that can detect fall-down 
events because of the increased number of surveillance cameras being 
installed in many public indoor and outdoor locations. Fall-down event 
detection has been vigorously and extensively researched for safety purposes, 
particularly to monitor elderly peoples, patients, and toddlers. This computer 
vision detector has become more affordable with the development of high¬ 
speed computer networks and low-cost video cameras. This paper proposes 
moving object detection method based on human motion analysis for human 
fall-down events. The method comprises of three parts, which are 
preprocessing part to reduce image noises, motion detection part by using 
TV-Ll optical flow algorithm, and performance measure part. The last part 
will analyze the results of the object detection part in term of the bounding 
boxes, which are compared with the given ground truth. The proposed 
method is tested on Fall Down Detection (FDD) dataset and compared with 
Gunnar-Farneback optical flow by measuring intersection over union (loU) 
of the output with respect to the ground truth bounding box. 
The experimental results show that the proposed method achieves an average 
loU of 0.92524. 
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1. INTRODUCTION 

Object detection is one of the research topics in computer vision which detects the presence of 
objects of interest and locates their positions. The main task of object detection method is to segment any 
moving objects from the background [1]. Thus, the segmented object of the interests is then labeled as 
‘foreground’ and the rest of the pixels are labeled as ‘background’. With the advancement of image and video 
fields, many automatic complex systems can be designed with the help of the high-resolution cameras and 
high-speed computer networks. Hence, the role of automatic detection algorithm has become more important, 
especially for the applications that focus on in daily life such as behavioral analysis [2-3], urban surveillance 
[4-5], and object recognition [6-8]. 

Whilst, a human fall-down event is defined as an incident in which the body of the person of interest 
halts or rests unintentionally laying on the ground or any other lower surfaces [9]. A fall-down event also 
takes place when a person slips unexpectedly while walking or standing. The World Health Organization 
(WHO) [10] has also reported that fall-down event is a major public health concern, which is the second 
leading cause of unintentional injury death after road accident worldwide. In addition, WHO also reported 
that an estimated of 646 000 individuals died globally each year because of fall-down event related incidents. 
Figure 1 shows some samples of fall-down events for various fall postures. 
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Figure 1. Image samples of fall-down events 

Therefore, automatic fall-down event sensor is very crucial for the applications in hospitals, elderly 
houses, and other public places. Early detection of the event is crucial to reduce the “long lie” situation which 
is the time period where a subject remains lying on the floor after the fall incident. It is the key factor that 
determines the health impact severity of a fall-down event. Clinical studies also have shown that a long lie 
case usually leads to dehydration, hypothermia and pressure sores [11, 12]. In addition, it might also lead to 
psychological consequences such as loss of independence ability, fear f falling and trauma [13]. 

The proposed work is based on a vision approach that does not require too much complex 
processing of the videos. The basis of fall detector is TV-Ll optical flow algorithm. The main motivation 
behind this work is due to the fact that there is a relatively large number of peoples who die owing to the late 
awareness and treatment of fall-down event after the incidents have occurred. Hence, an automatic and 
efficient system based on motion analysis is much needed to mitigate this problem. This paper is organized as 
follows: Section 2 discusses related works on object detection in fall-down events. Section 3 explains the 
proposed method of object detection. Section 4 shows the experimental results and performance comparison, 
and a conclusion is provided in Section 5. 


2. RELATED WORK 

Object detection techniques for fall-down events have become a big subtopic under computer vision 
and image processing field. Generally, these techniques are tremendously explored because of its less 
intrusive behavior as well as robust and easy to be implemented in various environments. In addition, a vast 
amount of information can also be extracted concurrently from the surveillance cameras such as motions, 
locations or actions of the monitored person of interest [14]. Typically, object detection methods in the fall- 
down event can be divided into two approaches; background subtraction and optical flow. 

2.1. Background subtraction 

Background subtraction is the most frequently used method in fall-down event detection that finds 
the differences between the established background model and the current image so that the foreground 
object can be segregated from background. It is used mainly for static cameras set up. Basically, background 
subtraction techniques provide fast computation algorithm with good accuracy. Poonsri & Chiracharit [15] 
used a mixture of Gaussian model (MoG), which is a statistical approach to extract the foreground objects. 
Then, they merged the results of MoG method with the mean filter. They also implemented some 
morphological operations to remove the noise to improve foreground detection accuracy. However, MOG is 
sensitive to detect all the moving objects which usually lead to false foreground detection. 

Yu et al. [16, 17] applied background subtraction method using codebook algorithm to extract the 
foreground silhouette for a single camera system. They argued that codebook algorithm can achieve better 
performance by utilizing more comprehensive information from the color space. They also stated that their 
approach is capable to cope with illumination changes and adaptive parametric variation since no assumption 
is made as compared to other background methods such as MoG and single-mode background subtraction 
method in [18, 19]. 

Wang et al. [20] implemented background subtraction VIBE-h method to extract the foreground 
object. Then, they performed connected component analysis to combine and label the components as the 
foreground. Besides that, Yun et al. [21] performed background subtraction method using Gaussian Mixture 
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Models (GMM) to depth images instead of RGB images. The foreground is extracted such that the mixture 
models can determine the most probable pixel’s class by modelling the intensity values. They further 
implemented a series of morphological operations to remove the noise to obtain a clean silhouette of the 
foreground object. 

2.2. Optical flow 

Optical flow [22] is defined as the movement pattern of the object’s motion contained in a video. 
The motions which are usually represented by velocity are estimated based on similar points of the two 
consecutive frames. Optical flow can give complete movement information of the whole frame and suitable 
to be implemented as a moving object detector. Bhandari et al. [23] used Lukas-Kanade optical flow for the 
motion estimation of the foreground. Their algorithm first finds points of interest using Shi-Tomasi method 
applied to the output of Harris-Stephens comer detection with several threshold parameters. The points of 
interest are kept if the flow of any respective points is matched with a corner point and their distance 
difference is less than a small number. Otherwise, the points will be discarded. The same process is repeated 
until the end of a video. 

Paper in [21] utilized Horn-Schunk and Lucas-Kanade optical flow methods to detect foreground 
objects motions in RGB depth videos. Then, they extracted histogram-based features of the optical flow. This 
histogram is a useful measure to describe motions of the foreground pixels, which later used to classify the 
fall-down event. 

Belshaw et al. [24] applied Earneback optical flow to represent motion of the foreground blob 
between consecutive video frames. Optical flow is also incorporated into background adaptation approach so 
that background model can be updated to cater multiple active region blobs. Specifically, magnitude of the 
optical flows is used to control background adaptation rates. This method is devised based on assumption that 
motion cues can be used to remove stationary blobs as well as to identify lighting changes. 

Alaoui et al. [25] proposed a combination method between Earneback optical flow algorithm with 
Von Mises distribution to determine and identify the moving object. Background subtraction and 
morphological operations are performed first before extracting the foreground pixels. Then, optical flow is 
used to calculate motion vectors of the foreground object. Later, Von Mises distribution is implemented to 
calculate the mean direction of the vectors. Apart from foreground extraction, the proposed method is able to 
determine the orientation vectors of the object before and after the event. 


3. RESEARCH METHOD 

Eigure 2 shows flowchart of the proposed framework of moving object detection that comprising of 
three stages; preprocessing, object detection and performance measure. The proposed method utilizes optical 
flow approach to detect the moving object, in which a single person in the case of walk and fall videos. 



Eigure 2. Elowchart of the proposed framework 


3.1. Preprocessing 

The preprocessing part focuses on producing better input image to the moving detection part. Image 
filtering using 5x5 averaging filter as in (1) is implemented to smooth out the noise in the input videos. The 
underlying assumption used is any raw input videos are not suitable to be directly processed because of the 
noisy pixels and illumination variations that present in the video. Therefore, some preprocessing techniques 
need to be performed first to reduce the effect of the previously mentioned issues. Then, each video frame is 
cropped to obtain region that consists only the moving objects. 
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3.2. Object detection 

Optical flow is one of the moving object detection methods which approximate the apparent motion 
of the pixel’s brightness between two consecutive frames. Optical flow can also be used to represent the 
velocity of the pixels. The TV-Ll optical flow 126] is implemented in this work to determine and detect the 
moving object based on OpenCV implementation. The TV-Ll optical flow is chosen due to better 
performance under various lighting conditions compared to other optical flow algorithms 113]. It is also a fast 
computation algorithm with comparable accuracy and ability to deal with occluded areas so that flow 
distortion can be prevented 127]. 

In general, TV-Ll optical flow is one of the variational methods for the optical flow estimation and 
it has become more popular and extensively researched because of its robustness and accuracy. Basically, the 
underlying idea of this optical flow is the brightness between two images remains similar under motion and 
sometimes coined as brightness constancy assumption 128]. Thus, TV-Ll optical flow is defined as a 
combination of the brightness and gradient constancy assumptions but with varying the weight under the 
Chambolle function in the regularization term 129] with respect to classical Horn-Schunk approach 130]. 
Moreover, the previous studies have shown that the combination between both brightness and gradient have 
led to a robust flow estimation under various illumination changes 131] and image noises. The average 
magnitude of optical flow vector is computed in each video frame to infer the predicted bounding box. These 
boxes are then used in the next module for the performance measure purpose of identifying the 
fall-down event. 

Let u = [u, v} be the displacement field at pixel coordinate x = [x, y}. The optical flow can be 
written in the non-linear formulation as (2) with /j and /j+i are the current and future frames, respectively. 
The equation can be linearized using Taylor expansion as in (3) with u® as an approximation to u. 


/i+i(x + u)-/i(x) = 0 


( 2 ) 


p(u) = V/i+i(x + u®). (u - u®) - /i+i(x + u®) - /j(x) 


(3) 


(3) assumes that pixel intensities are constant over time which is not practical in the real-life scenario. Thus, 
this equation can be modeled with an additional function o) with weight y as in (4). 


p(u) = (V/i+i)^(u - u®) + It+yoj 


(4) 


The L^ penalization for both regularization and data term can be optimized by minimizing the 
energy function as in (5), where X is the trade-off between regularization and data term. However, (5) is not 
trivial to be solved as an optimization problem. Thus, (6) is used to solve the problem by introducing the 
convex relaxation term with p is another auxiliary variable as u and 0 is a constant, in which the goals is to 
minimize the mentioned energy function. 


b/n \\p{u,v)\\i + ||u||i + ||v||i] 

E = P(P) + Jn ll^^ll + ll^ll + ^11“ “ 


(5) 

( 6 ) 


3.3. Performance measure 

The ground truth localization of the moving object in each frame is annotated manually by using a 
bounding box that surrounds the object. These boxes are then used to evaluate the Intersection over Union 
(loU) between the ground truth boxes and the output boxes of the proposed method. The loU or also known 
as the Jaccard index is defined in (7) and illustrated in Figure 3. 


loU = 


Area of overlap 
Area of union 


|AnB| 


|A| + |B|-|AnB| 


(7) 
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Eigure 3. Illustration of the intersection over union (loU) 


4. RESULTS AND ANALYSIS 

The proposed framework is tested using Eall Detection Dataset (EDD) [32], which is an online fall- 
down database that comprises a total of 126 annotated videos. These videos contain four different stages with 
nine different persons acting the fall incident. The frame rate is 25 frames/sec with a frame size of 
320 X 240 pixels. The proposed framework is also employed using Python 3.6 on Intel Core i7 3.4 GHz 12 
GB RAM desktop computer. 

Eigure 4 shows the sample results of the proposed method using EDD database. The performance of 
the proposed method is compared with the Gunnar-Eameback optical flow [33], which is another type of 
dense optical flow. Eigure 4(a) shows the sequential frames of the EDD, while Eigure 4 (b) and Eigure 4 (c) 
show the corresponding optical flow images of the TV-Ll and Gunnar-Earneback optical flow. 




(b) 



(c) 


Eigure 4. (a) Samples of sequential frames of Eall Detection Dataset, (b) Corresponding optical flow images 
of TV-Ll algorithm, (c) Corresponding optical flow images of Gunnar-Earneback algorithm 


Eigure 5 and 6 show the graphs of the average flow magnitude of the TV-Ll and Gunnar-Eameback 
algorithms for all sequential frames in Eigure 4. Erom the graphs, the Gunnar-Eameback optical flow method 
produces higher average magnitude compared to TV-Ll optical flow. This is because Gunnar-Earneback 
cannot cope well with the noise as it considers them as moving pixels and thus higher moving magnitude. 
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Figure 5. Average flow magnitude of TV-Ll optical flow for all sequential frames in Figure 4 
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Figure 6. Average flow magnitude of Gunnar-Farneback optical flow for all sequential frames in Figure 4 


The loU test is then performed on both outputs of the optical flow algorithms. The illustration of the 
loU is shown in Figure 7 with green bounding box is the ground truth of the FDD and the red bounding box 
is the optical flow output bounding box. Table 1 lists the computation loU results for both TV-Ll and 
Gunnar-Farneback optical flow for 14 videos. In average, the loU of TV-Ll optical flow is higher compared 
to Gunnar-Farneback method with an average of 0.92524 compared to 0.92346. However, Gunnar-Farneback 
method produces higher loU results for Video 3, 8 and 9 because of the low noise videos. Even though the 
loU differences are not too big, it still gives a big impact for fall-down detection, especially during the 
transition period between just before and after the fall-down incident. 



Figure 7. The illustration of loU 


Table 1. loU results for TV-Ll and Gunnar-Farneback optical flow 


#Video 

Intersection over union (loU 
TV-Ll Gunnar-Farneback 

1 

0.92227 

0.92189 

2 

0.91823 

0.91685 

3 

0.91786 

0.91998 

4 

0.91289 

0.91086 

5 

0.91776 

0.91516 

6 

0.94445 

0.93651 

7 

0.92682 

0.92467 

8 

0.93954 

0.94199 

9 

0.93460 

0.93694 

10 

0.91080 

0.90927 

11 

0.93145 

0.92838 

12 

0.92236 

0.91979 

13 

0.94605 

0.93929 

14 

0.90834 

0.90697 

Average 

0.92524 

0.92346 


Bulletin of Electr Eng and Inf, Vol. 8, No. 3, September 2019 : 839 - 846 












Bulletin of Electr Eng and Inf 


ISSN: 2302-9285 


n 845 


5. CONCLUSION 

In conclusion, a moving object detection using TV-Ll optical flow for fall-down videos has been 
proposed and tested. The proposed framework starts with the preprocessing step followed by the computation 
of the optical flow algorithm which is TV-Ll optical flow. The average flow magnitude is then computed for 
each frame to obtain the output bounding box. Then, this box is compared with the ground truth data using 
loU test. The performance of the proposed method is benchmarked with the Gunnar-Earneback optical flow. 
Based on the experimental results, TV-Ll optical flow achieved an average loU with 0.92524 which 
outperforms the Gunnar-Earneback optical flow. Eor future work, the detector can be further by using 
additional features such as titled angle, middle-points, and motion speed. 
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